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LINE-1 (LI) elements are the only 
active and autonomous transposable 
elements in humans. The core retrotrans- 
position machinery is a ribonucleopro- 
tein particle (RNP) containing the LI 
mRNA, with endonuclease and reverse 
transcriptase activities. It initiates 
reverse transcription directly at genomic 
target sites upon endonuclease cleavage. 
Recently, using a direct LI extension 
assay (DLEA), we systematically tested 
the ability of native LI RNPs to extend 
DNA substrates of various sequences 
and structures. We deduced from these 
experiments the general rules guiding 
the initiation of LI reverse transcription, 
referred to as the snap-velcro model. In 
this model, LI target choice is not only 
mediated by the sequence specificity of 
the endonuclease, but also through base- 
pairing between the LI mRNA and the 
target site, which permits the subsequent 
LI reverse transcription step. In addi- 
tion, LI reverse transcriptase efficiently 
primes LI DNA synthesis only when 
the 3' end of the DNA substrate is sin- 
gle-stranded, suggesting so-far unrec- 
ognized DNA processing steps at the 
integration site. 

L1 Elements are Endogenous 
Mutagens in the Human Genome 

Transposable elements account for 
half to two-thirds of the human genome.' 
Among them, LINE-1 (LI) non-LTR ret- 
rotransposons form the only autonomous 
and active family and are the most abun- 
dant, representing 17% of our DNA. Each 



individual genome contains hundreds of 
potentially active LI copies, and hundreds 
of thousands of defective copies, which are 
truncated, fragmented, and/or mutated.^ 
The active copies can proliferate via an 
RNA-mediated copy-and-paste mecha- 
nism, called retrotransposition. LI inser- 
tions are intrinsically mutagenic, however 
their actual impact on gene expression 
depends on their specific site of integra- 
tion. Intergenic or deep intronic insertions 
can often have no detectable effects on 
genes. In contrast, insertions in exons or 
regulatory sequences have the potential to 
profoundly alter gene expression/function, 
by disrupting coding- or cis-regulating 
sequences, or by carrying cis-regulating 
sequences (transcription factor binding 
sites, cryptic splicing and polyadenylation 
sites, etc.).^'^ Hence, germline LI inser- 
tions sporadically cause de novo genetic 
diseases,^''' and somatic LI retrotransposi- 
tion in cancer has been shown to contrib- 
ute to tumor genome dynamics, including 
driver mutations. Therefore, exploring 
the mechanisms that influence LI target 
choice is crucial to our understanding of 
Ll-driven genome plasticity. 

L1 Retrotransposition 
can be Initiated through 
Two Different Pathways 

The LI replication cycle starts with 
the synthesis of a bicistronic mRNA cod- 
ing for the two LI proteins, ORFlp and 
ORF2p (Fig. 1) . ORE Ip is a 40 kDa RNA- 
binding protein able to form trimers."'" It 
exhibits nucleic acid chaperone activity,'* 
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Figure 1. The LI life-cycle. LI replication starts by the transcription of a bicistronic mRNA (A). The LI RNA is exported to the cytoplasm (B). ORFIp and 
0RF2p proteins are translated and bind to the LI RNA to form LI ribonucleoprotein particles (RNP) (C). The LI RNP is imported into the nucleus (D). 
Integration and reverse transcription occur at the genomic target site. First, the LI endonuclease (EN) activity nicks the target DNA (red arrowhead, E). 
Then, the LI reverse transcriptase (RT) initiates the reverse transcription of LI RNA (black arrowhead, F). The mechanisms involved in the final steps of 
this process and the resolution of the integration are unresolved yet (G). Partial reverse transcription can lead to 5'-truncated LI copies. 



the function of which has not been eluci- 
dated. ORF2p is a large, 149 kDa, protein 
with endonuclease (EN) and reverse tran- 
scriptase (RT) activities."'"' Both ORFIp 
and ORF2p bind the LI RNA to form a 
stable ribonucleoprotein particle (RNP), 
the core of the LI retrotransposition 
machinery.'^'"* The LI RNP can mediate 
two different integration processes. In the 
canonical pathway, called target-primed 
reverse transcription (TPRT)," the LI EN 
activity produces a nick at the recognized 
target site in the chromosomal DNA.'* 
The RT moiety then extends this liberated 
3'-OH group, using the LI RNA as a tem- 
plate.^" Reverse transcription is primed 
within the poly(A) tail of the LI RNA.^' 
LI EN preferentially cuts DNA at a con- 
sensus sequence 5'-TTTTA-3', with nick- 
ing occurring at the TpA bond."''^"'^'' In an 
alternative pathway, named endonuclease- 
independent (ENi) retrotransposition or 



non-classical LI insertion (NCLI), reverse 
transcription is initiated at pre-existing 
DNA lesions, without the need for endo- 
nuclease cleavage. ^''^^ A particular case 
of this pathway is retrotransposition at 
telomeres, the natural extremity of chro- 
mosomes.^^ Regardless of whether a par- 
ticular retrotransposition event is initiated 
by EN or not, the subsequent steps of the 
reaction, such as second strand DNA syn- 
thesis or ligation of the 3' ends of the neo- 
synthesized DNA to the target DNA, have 
not been explored yet. 



The Snap-Velcro Model of L1 
Reverse Transcription Initiation 

Recently, we explored the mechanism 
of LI reverse transcription initiation using 
a direct LI extension assay (DLEA).^** In 
this approach, native LI RNP expressed 



in — and enriched from — human cells 
are incubated with oligonucleotide prim- 
ers of various sequences or structures, and 
with radioactive dTTP only, for a very 
short time (less than 5 min). The prod- 
ucts are then resolved on sequencing gels 
to directly visualize the extension of the 
primer. Due to the short incubation time 
and the use of dTTP only, it focuses on 
the initiation of reverse transcription. 
Advantages of this method, compared 
with previously PCR-based techniques, 
include its versatility with regards to the 
primers that can be used and its quantita- 
tive nature. A limitation of DLEA is the 
absence of sequence information and its 
lower sensitivity. 

One of the unresolved questions 
related to LI reverse transcription priming 
was whether — or to which degree — the 
3' end of the nicked genomic DNA needs 
to be accessible and to base-pair with the 
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Figure 2. Features of the snap-velcro model of LI reverse transcription priming. (A) Reverse transcription priming only occurs if the DNA substrate is 
single-stranded. (B) Reverse transcription priming requires base-pairing between the LI RNA (pink) poly(A) tail and the target-site DNA (green). The snap 
(bold green) corresponds to the last 4 nucleotides at the 3' of the DNA primer. The velcro (light green) contains the 6 bases upstream of the snap. The 
snap is considered as closed if 4 nucleotides are T. The velcro is tightly fastened if the position-weighted T-density is superior to 0.5 (see ref. 28 for the 
detailed numerical model). The snap-velcro status predicts the efficiency of LI reverse transcription priming (green arrow). 



poly(A) tail of the LI RNA. Indeed, R2, 
a related non-LTR retrotransposon which 
has been used to establish the basis of the 
TPRT model, does not require such a 
complementarity.^'''" Although the con- 
sensus sequence released upon LI EN 
cleavage (5'-TTTT-3') could in principle 
anneal to the poly(A) tail of the LI RNA, 
it is extremely short for maintaining a 
stable interaction and the actual sequences 
cleaved by the LI EN can significantly 
differ from the consensus sequence. To 
directly address this question, we quanti- 
fied the efficiency of extension of more 
than 65 primers by DLEA. Based on 
the results of these experiments, and on 
additional analyses of the distribution of 
polymorphic LI insertions in the human 
genome, we proposed the snap-velcro 



model for LI reverse transcription initia- 
tion.^** This model is detailed in Figure 2. 
The efficiency of reverse transcription 
initiation is influenced by the 10 last 
nucleotides of the target DNA. The 4 
last nucleotides (the snap) contribute 
the most to this process. Reverse tran- 
scription priming is the most efficient 
when the snap corresponds to 4 Ts (snap 
closed). However suboptimal sequences 
with terminal or internal mismatches can 
be tolerated (snap open). For terminal 
mismatches, the efficiency of extension 
depends on the nature of the base ending 
the primer (T > C > A > G). These sub- 
optimal sequences can be more efficiently 
extended if mismatches are compensated 
by an increased number of matching Ts in 
the upstream 6 nt (velcro strap fastened). 



Finally an important aspect of our results 
is that priming only occurs if the DNA 
substrate is single-stranded. Indeed, dou- 
ble-stranded DNA substrates are extended 
only if they end with a 3' overhang. If the 
3' extremity of the target DNA is embed- 
ded in duplex DNA, either as a blunt- or 
as a 3 '-recessed end, no extension could 
be detected under the DLEA conditions 
employed. 



Consequences of the 
Snap-Velcro Model Related 
to Primer-Template Sequence 
Match 

The snap-velcro model indicates that 
complementarity between the LI poly(A) 
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Figure 3. Hypothetical mechanisms allowing LI RNA base-pairing with target-site DNA. (A) 0RF2p dimerization leads to simultaneous staggered cuts 
through its EN activity. The resulting extremities have 3' overhangs, which can anneal to the LI RNA and prime LI cDNA synthesis. (B) LI EN initially 
starts with a single cut, but a DNA-dependent helicase unwinds the target site DNA strands, enabling LI cDNA synthesis. (C) Upon double-strand DNA 
break, DNA repair factors resect these ends and generate 3' overhangs. These new extremities not necessarily end with Ts as for EN sites. Consequently, 
base-pairing generally occurs at internal sites within the LI RNA which show spurious matches with the damaged site. (D) Telomeres naturally end with 
3' overhangs. Red arrowhead, EN cut; green, cDNA; pink, LI RNA. 



tail and the last 10 nucleotides of the 
target DNA is important for priming of 
LI reverse transcription, yet allows suf- 
ficient flexibility to accommodate a wide 
range of potential target sites. It explains 
a long-standing observation that LI ele- 
ments are often flanked by imperfect 
T-rich sequences significantly longer 
than expected for the recognition site of 
the LI endonuclease.^''^^'^''^^ This model 
also implies that LI target-site selection 
relies not only on the sequence specificity 
of the EN nicking reaction, but also on 
the subsequent ability of the RT to effi- 
ciently extend the cleaved product. This 
observation has practical and technologi- 
cal implications. Non-LTR retrotrans- 
posons fall into two main classes: i) the 



stringent elements, which always insert 
into discrete and defined genomic loca- 
tions with high sequence specificity (such 
as Rl, R2, TxlL, SARTl, Trasl), and ii) 
more promiscuous elements, which insert 
into multiple locations within a very short 
and degenerate sequence (such as LI). EN 
domain swapping between two stringent 
retrotransposons is sufficient to exchange 
their respective target site selectivity both 
in vitro and in vivo.'*'' Similarly, struc- 
ture-driven domain swapping between a 
stringent EN and the LI EN moiety has 
succeeded in modifying its target selec- 
tivity in vitro.^'' However, such EN vari- 
ants, when reintroduced into a complete 
LI element and tested in vivo, are unable 
to redirect LI insertions to altered target 



sites. Instead, they continue to insert in 
T-rich stretches,^^'' indicating that other 
determinants downstream of the initial 
EN cleavage contribute to LI target site 
selection. Our results suggest that LI RT 
priming specificity could be one of these 
determinants. Therefore engineering LI 
to achieve site-specific integration in vivo 
might require to take into account both 
EN and RT specificities. 

Consequences of the 
Snap-Velcro Model Related to the 
Accessibility of the Target DNA 

A second important feature of the 
snap-velcro model is the requirement 
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for a single-stranded 3' overhang to ini- 
tiate reverse transcription at the target 
site. The original TPRT model stipulates 
that retrotransposition is initiated by a 
nick in the target DNA."'^°''' How does 
the 3'- OH extremity of the substrate, 
which is embedded in the duplex DNA, 
become available for LI RT? We can envi- 
sion several possibilities. First, the LI EN 
produces double-stranded staggered cuts 
rather than nicks. In vitro, plasmid DNA 
can indeed be linearized upon prolonged 
incubation with an isolated recombinant 
EN domain."^ Whether LI ORF2p acts as 
a monomer or a multimer in the context 
of the LI RNP is unknown. However, 
many other reverse transcriptases form 
dimers, including the R2 RT or human 
teiomerase.'*' '^ Dimerization of ORF2p 
could lead to concomitant cleavage of 
bottom and top strands, while maintain- 
ing the two target site DNA extremities 
together (Fig. 3A). Of note, the average 
length of the target-site duplication, which 
reflects the distance between the top and 
bottom strand cuts, is 15 nucleotides,^^' a 
length compatible with the minimal size 
of the single-stranded region (6 nt). A 
second hypothetical mechanism involves 
a strand-transfer or a DNA helicase activ- 
ity (Fig. 3B). Although ORFlp has been 
proposed to perform this task through its 
nucleic acid chaperone activity,'* the pres- 
ence of this protein in native LI RNPs 
was not sufficient under our experimen- 
tal conditions to prime reverse transcrip- 
tion with duplex DNA substrates. Recent 
efforts have identified a number of cellular 
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